Datalicious Depths of Delectable Data

Want to jump to a specific question?

Unfortunately, within this data set, Open Psychometrics did not provide us with specific results (e.g. SCUEI), but with a bunch of numbers. In fact, this is what the data looks like:

I’ve only selected the first 6 responses. Not too bad, right? But, you’ve gotta imagine about 50 more columns, and about a million more rows. It’s a little more intimidating now, but we’ve gotta start somewhere >:)

First, let’s calculate the scores for the Extroversion personality trait. We can do this by summing up the scores of the 10 EXT (standing for EXTroversion) questions. Of course, it should be noted that answering a ‘4’ or ‘5’ on a EXT question doesn’t always mean that a user is more extroverted. For example, EXT Question 2: I don’t talk a lot vs EXT Question 1: I am the life of the party.

So, we’ll need to make sure to add and subtract scores as necessary - I’ve arbitrarily made the decision to add ‘points’ if a question symbolizes extroversion, while subtracting ‘points’ if a question is correlated with introversion.

Hence, after adding up all the scores, any positive score (> 0) will mean that the user receives an ‘S’ for Sociable, while a negative score (< 0) would result in an ‘R’. If their score is 0, then they’d receive an ‘X’, as their results can’t really be evaluated since they’re perfectly in-between!

Sample table of results, where extscore and extletter are calculated using the raw data:


Visualizing this data into a graph:

I’m going to refrain from any analysis on WHY there’s more R’s compared to S’s until the final results. But, it’s a pretty 50/50 split!

The same process will be applied to the other personality traits1, with the graphs of results located below:

After getting these values, we can now combine them together to get the results. I’m also going to limit this to the top 25 results… or else the data just gets super messy.


WOOO! Congratulations to the XXOAI family for being among the most common result out of 243 possibilities! If you’d prefer to see a more numerical visual, I’ve provided a table containing the 10 most common results below:

Table 1: The 10 Most Common Results
Results Amount of Responses Percentage of People
SCOAI 160758 15.832892
RLOAI 132305 13.030585
RCOAI 106355 10.474796
RLUAI 98833 9.733962
SLOAI 96998 9.553234
SLUAI 72098 7.100859
SCUAI 54528 5.370407
RCUAI 40040 3.943499
RLXAI 14370 1.415287
SXOAI 11818 1.163943

We can also look at the ‘average’ scores of each personality trait.2

EXT EST CSN AGR OPN
avgscore -0.2933333 -0.2266667 2.36 12.65333 6.533333

Let’s try to analyze these results.

Extrovertism

There seems to be a nice mix of S(ocial) and R(eserved) values, which is representative of the larger population. It’s the most balanced compared to the other 4 traits, and actually mirror what is expected. This is likely because it’s quite easy for oneself to decide if they’re ‘introverted’ or ‘extroverted’, and the questions (e.g. “I start conversations”) tended to be quite straight-forward and are experiences that they would’ve gone through in their daily life.

When looking at TABLENUM, There are technically about 40000 more R’s than S’s. If I wanted to stereotype (which is an inevitable part of ‘absolute letters’, rather than 70% R vs 30% S), maybe the S’s aren’t as likely to spend 20 minutes on an online test, compared to R’s, who are more likely to be holed up in their room.

Neuroticism

Once again, a solid sample of results between C(alm) and L(imbic). I think it’s similar to extrovertism, where it’s quite easy to determine how strongly you feel your emotions, and your ability to control them.

Scrolling up to TABLENUM, there are more limbic people, compared to calm, by about 60000. I’m not really surprised, in fact, I’d think they’d be more limbic people, as society slowly puts more emphasis on mental health and actually exploring our emotions. This could make us more moody - we actually try to work through our emotions, rather than repressing them, and pretending to be calm. However, it could also be the other way around! By engaging with our emotions, we can better understand ourselves, and lead a life where we are calming since we build control!

Conscientiousness

Same as above, for O(rganized) and U(nstructured). Relatively simple to identify, applicable in daily life.

There’s a lot more O’s than U’s, which I’m also surprised about? There’s also the most X’s in TABLENUM, compared to others. The X’s, I think, can be explained since people fluctuate between having super organized lives, but a messy desk? It’s easy to be both organized and unorganized. As for the large amount of O’s… It’s clearly not represented as much in TABLENUM, considering there’s basically an equal number of each response. It could be because people want to see themselves as organized, or they think they’re organized (e.g. they know where everything is on their desk, but items are actually strewn all over), when in reality, they aren’t! A good thought to carry for the next two personality traits.

Agreeableness

This is where it gets a bit funny! There is not a SINGLE E(gocentric) - it’s just a sea of A(greeable). This is probably because this is a self-administered test, people want to be seen as agreeable in society (correlated with ‘niceness’), and people are biased.

Let me tell you a story that I heard from a TEDTalk. There’s this guy, I think he’s a magician, and he’s at the airport waiting. Something bad has happened that I can’t recall: Maybe his flight has been delayed by 2 hours, or he really needs a refund, or he’s lost his luggage. So, he calls their airline, and the customer representative is clearly in a bad mood, maybe a little bit angry or grouchy, and he’s getting nowhere with his request. The guy hears her voice, and realizes: “Oh, the representative must think that I’M so lucky (despite how she sounds) that I’M talking to her, because she’s taking SO MUCH TIME out of HER day to help me.” So, the guy goes “Hey! I really appreciate you for helping me, and I’m really grateful that I’m talking to you!” The flight attendant INSTANTLY is in a better mood, and becomes genuinely helpful, and ends up solving his request, after he acknowledges her struggles and actually is really kind to the flight attendant, who’s far more used to being yelled at!

So, aside from being a heart-warming story, it also serves to prove two things:

So, it makes sense why, if we’re inputting our own answers without requiring proof, people can naturally skew themselves into thinking that they’re nicer than they actually are. I think the same concept applies to agreeability - we like to think that we’re agreeable, because it’s correlated with niceness, and we like being nice. Not only that, but we want to be liked, and oftentimes, by agreeing with others, people have a nicer opinion of us! Lastly, it’s a lot easier to agree with others, than to go against the status quo. Maybe people do disagree, but they’ve just repressed their own thoughts because… society.3 The opposite also applies to egocentrism - nobody wants to call themselves egocentric, as it’s associated with narcissism and self-centeredness. They’re traits that nobody wants. Not only that, but who wants to admit that they agree with “I feel little concern for others.”

Openness

It’s anti-egocentrism part 2, except this time, people are refusing to call themselves non-inquisitive! In fact, it’s even worse than before, if you look at TABLENUM! Once again, it goes back to the same ideas:

However, I think one major point that Openness has that Agreeability doesn’t, is just the fact that taking a Big Five personality quiz, inherently means that you’re going to be more inquisitive. If you weren’t curious about it in the first place, you just… wouldn’t take the quiz or care about the result. Hence, there’s some sample bias, as the people you’re getting results from, are already skewed towards inquisition. In addition, psychology is seen as something ‘nerdy’ and scientific, which caters to ‘smarter’ or ‘more educated’ audiences.

Hence, someone might agree with “I have a rich vocabulary”, not because they’re interested in learning new words, but just because they perceive themselves as having a lot of knowledge due to their education. Or answer a 1 (disagree) to “I have difficulty understanding abstract ideas” or a 5 (agree) to “I am full of ideas” because, through education, they’ve trained themselves to be better at comprehending abstract ideas or becoming better at brainstorming.

Sheer Statistical Impossibility…?

However, When analyzing this data, it seems.. strange that so many XXOAI types are represented. In fact, it feels weird that 15%(!!!!) of people were the EXACT SAME TYPE, despite ALL THE POSSIBILITIES!

So, let’s see how it compares to other, more theoritical, data. Using the data4 from SimilarMinds, we can see how our data stacks up.

Table 2: Theoretical Data
Results Percentage of People
SCOAI 3.4
RLOAI 2.7
RCOAI 3.5
RLUAI N/A
SLOAI 2.4
SLUAI 3.4
SCUAI 3.5
RCUAI N/A
RLXAI N/A
SXOAI N/A
Table 2: Experimental Data
Results Percentage of People
SCOAI 15.832892
RLOAI 13.030585
RCOAI 10.474796
RLUAI 9.733962
SLOAI 9.553234
SLUAI 7.100859
SCUAI 5.370407
RCUAI 3.943499
RLXAI 1.415287
SXOAI 1.163943

Looking at these tables, there’s a clear discrepancy between theoretical and experimental data, where none of the theoretical results really match up to what is seen in our data set. However, most of it can be explained by the previous analysis: Wrong results might occur because of bias and societal norms (cementing the last two letters to be A and I), in addition to the natural disposition of responders. This comparison really just showcases how there’s a high likelihood of bias.

Honestly, I’m not too sure why 15% of people are SCOAI, maybe people are being influenced to answer what they WANT to be like, rather than what they actually are, since SCOAI seems like one of the most ‘socially-successful’ results. But, in reality, responders aren’t actually SCOAIs. That’s my best guess!5

Also, if you’re interested, here’s a list of the most uncommon results!

Table 3: Most Uncommon Results
Results Amount of Responses
SLXXX 1
XLUXX 1
XLXXX 1
XXOXX 1
XXUXN 1
XXXXN 1
RXXEX 2
RXXXX 2
SXXEN 2
SXXXN 2

No surprise, it’s a lot of results with X’s. It’s pretty hard to get an X, because you’re perfectly in between!

At first, I was surprised that ‘XXXXX’ didn’t appear, since it’s technically the most unlikely.6 However, I wouldn’t be surprised if only 5% of those results were genuine user responses, and the other 95% were just people who kept clicking 3 (Neutral), or had some kind of game to see if they could perfectly answered the questions to get XXXXX.

Do results vary between countries?

This data from 224 unique locations/contains 224 different ISO country codes.7 Let’s dig through this data - a fun bit of stalking!

We can see that the majority of data came from the US, with a whopping total of 546403 respondents. Followed behind them is Great Britain (GB), Canada (CA), and Australia (Au). This is likely because this quiz is in English, and will generally cater towards countries with English as their primary language.

Google’s SEO (Search Engine Optimization) is also affected by location, and can rank websites by their proximity to the user8. So, it’s possible that Open Psychometrics is American9, and when Americans search up “Big Five Personality Test”, this is the first quiz that shows up.10

But, we’re not really concerned on WHERE people are taking the quiz, but how it affects answers. Hence, I’m going to find the top 5 most popular results for America, Great Britain, Australia, and the Philippians–top results from different continents–and seeing how they compare to one another.

Table 4: America
Results Percentage
SCOAI 17.003201
RLOAI 13.229247
RCOAI 11.315458
SLOAI 9.800093
RLUAI 9.042227
SLUAI 6.695974
SCUAI 5.258573
RCUAI 3.796099
RLXAI 1.324846
XCOAI 1.215952
Table 4: Great Britian
Results Percentage
SCOAI 13.212505
RLOAI 12.659920
RLUAI 11.545739
SLOAI 9.745330
SLUAI 9.147697
RCOAI 7.955433
SCUAI 6.004865
RCUAI 3.767494
RLXAI 1.461049
SLXAI 1.144213
Table 4: Australia
Results Percentage
SCOAI 16.713972
RLOAI 12.280632
RCOAI 10.335798
SLOAI 9.558265
RLUAI 9.244453
SLUAI 7.263642
SCUAI 5.602638
RCUAI 3.699780
RLXAI 1.331201
SXOAI 1.215271
Table 4: Philippines
Results Percentage
RLOAI 17.181438
SCOAI 12.042122
RCOAI 10.182899
SLOAI 10.031743
RLUAI 8.706606
SLUAI 4.343226
RCUAI 2.589812
SCUAI 2.030534
RLXAI 1.970071
RXOAI 1.839069

From this table, it’s pretty clear that the countries generally have similar results, but Philippines is the only one that really stands out from the rest. Namely, the country has more RLAOI than SCOAI, meaning respondents tended to be more limbic and introverted than their counterparts. It’s difficult to point out why without knowing the average age of the respondents, as that could dictate a lot of their behaviors.

However, this also shows these countries follow the trend - every single one of the results were inside the top 25. Of course, this is the expected result, but one can always wish for a little special surprise, you know?

Let’s try zooming out by plotting the averages on a world map, and then analyzing results.

As for a note on the data: Locations with less than 10 responses have been omitted from the data, as they often significantly skewed maps. This removed 58 locations off the map, with Africa losing a pretty big chunk of their land.

(#fig:extscores world map)Diamond sadasds

Higher scores are correlated with extroversion, lower scores are correlated introversion.

Table 5: Most Extroverted
region ExtScores
Cuba 2.708333
Greenland 2.382353
Rwanda 2.093750
Ethiopia 1.342960
Afghanistan 1.018519
Norway 0.989608
Table 5: Least Extroverted
region ExtScores
St. Kitts & Nevis -6.333333
Sudan -5.466667
St. Lucia -5.190476
Åland Islands -4.733333
Guyana -4.543478
Bhutan -4.500000

When looking at extrovertism, countries that are more ‘globalized’, e.g. North America and East Asia are more extroverted, compared to Africa and South America. When looking at the most and least extroverted, it seems like there’s a smattering of locations from various corners of the world.

When looking at the list of least extroverted countries, they’re not ‘well-known’ countries on a global scale. Due to this, it’s possible they’ve become more introverted as they don’t regularly encounter new people/foreigners on a day-to-day basis.^’[Did you know that tourists have to pay 200 dollars per day to stay in Bhutan, which consistently ranks as one of the happiest countries? https://www.oneworldeducation.org/our-students-writing/bhutan-the-worlds-happiest-country/]

(#fig:est world map)Diamond sadasds

Higher scores are correlated with being calm, lower scores are correlated being limbic.

Table 6: Most Calm
region estScores
Suriname 4.697674
Eswatini 3.461539
Ethiopia 3.451264
Cape Verde 3.181818
Cuba 3.000000
Papua New Guinea 2.791667
Table 6: Least Calm
region estScores
Jersey -4.476191
Guernsey -4.044444
Syria -3.875000
Samoa -3.727273
Algeria -3.234310
Belize -3.125000

It seems like the south-eastern part of the world is less limbic, specifically Africa and East Asia. I find it funny that China specifically is seen to be relatively ‘calm’, as they are notorious for bad work-life balances11, the ‘lie down’ movement12 – things that show the ‘mentally-difficult’ conditions that they are living it.

Yet, it’s also reasonable to say that Chinese people are accustomed to high stress when dealing with academic pressures.13 Hence, they might have learned to become more emotionally stable.

(#fig:csn world map)Diamonddassda sadasds

Higher scores are correlated with organization and contentiousness, lower scores are correlated carelessness.

Table 7: Most Contentious
region csnScores
Samoa 5.545454
Ghana 5.356618
Guyana 5.043478
Cameroon 4.878788
Papua New Guinea 4.875000
Kenya 4.782206
Table 7: Least Contentious
region csnScores
Bhutan -1.1428571
Libya -0.7058824
Bolivia -0.6657534
Angola -0.5000000
Paraguay -0.2491103
Åland Islands -0.2000000

For conscientiousness, Africa is lit up like a Christmas tree! North America is also pretty light. On the other hand, South America is quite dark.

I also find it interesting that Asia, which is known for their ability to work hard and focus, seem to be quite average. Perhaps, this is where capitalism excels.

Conscientious people are typically correlated with success, as they’re able to complete their work thoroughly. With this hard work, they’re able to suceed more in capitalistic countries.

cheerio!!! @ref(fig:opn world map) wahahaa

(#fig:agr world map)Diamond sadasds

Higher scores are correlated with agreeableness, lower scores are correlated egocentrism.

Table 8: Most Agreeable
region agrScores
Papua New Guinea 17.04167
Rwanda 16.56250
Cameroon 16.45455
St. Lucia 16.42857
Cuba 16.20833
Tanzania 16.17442
Table 8: Least Agreeable
region agrScores
Madagascar 7.500000
Åland Islands 7.533333
Bhutan 9.428571
Poland 9.818162
Cape Verde 9.909091
Belarus 9.927711

Agreeableness looks pretty similar to the contenciousness graph. It’s difficult to not find the most agreeable countries the most egotistical, as they must think they’re always so agreeable! However, you can just say that these countries value politeness highly.

It’s also interesting to note that Asian countries typically are very resepctful of their elders, and bend to their will. This idea is not really represented within this graph.

(#fig:opn world map)Diamond Prices

Higher scores are correlated with inquisition, lower scores are correlated indifference.

Table 9: Most Inquisitive
region opnScores
St. Lucia 9.142857
Angola 8.642857
Madagascar 8.333333
Montenegro 8.183099
Seychelles 8.181818
Grenada 8.083333
Table 9: Least Inquisitve
region opnScores
Macao SAR China 3.758278
Cambodia 3.924675
Malaysia 4.201287
Bhutan 4.214286
Nepal 4.568536
Philippines 4.871044

The southern hemisphere seems to be more open compared to the Northern hemisphere, with Europe being the exception.

Are All Questions Created Equal?

We can also analyze the questions themselves. A funky little thing that this quiz did, was record how many milliseconds each respondent spent on each question. That seems like a fun thing to look at….

Thus, I present to you: General time spent on each question!

This is a good visual to compare questions to each other, especially questions in the same category.

For your reference:

For those unaware, this is a box plot14. The white box symbolizes the interquartile range, with the lower half of the box showing the bottom 25%, while the top half of the box shows the top 25%. The line in the center is the median of the question. It’s really good for visualizing the spread of data (which seems to range quite a bit). However, I’ve skewed the data by cutting off any values above 20 seconds15 and only took a random sample of 500 response times for each question16.

However, I want to be more precise with these loading times, so I’ve specifically pulled the questions that take the longest and shortest times to complete.

Table 10: Questions That Took the Longest
Category Question Time in Seconds
EXT1_E I shirk my duties. 39.30020
AGR7_E I leave my belongings around. 28.86483
CSN3_E I am relaxed most of the time. 17.84777
OPN4_E I feel little concern for others. 9.87530
EXT4_E I don’t like to draw attention to myself. 9.79843
Table 10: Questions That Took the Shortest
Category Question Time in Seconds
OPN1_E I use difficult words. 3.72914
AGR9_E I often feel blue. 3.69145
OPN8_E I have excellent ideas. 3.49578
EST9_E I am full of ideas. 3.23191
OPN10_E I have a rich vocabulary. 2.82167

I’ve taken the liberty of removing EXT_1 (the first question on the quiz), which had an average score of 87 seconds. This is likely because people people were getting accustomed to the quiz, had loading time issues, or started the quiz, then immediately forgot about it.

“I shirk my duties” took the longest, likely because ‘shirk’ is an uncommon word and people wanted to search up the definition, which easily could’ve taken an extra 10 seconds. As for the other questions, they’re all relatively long sentences and have more nuance. They’re all very situational, as people are rarely “always relaxed” or “never relaxed”, and it takes time to think whether you’re relax a MAJOURITY of the time. Same thing for little concern for others - there are tons of people you care about, but also, tons of people you don’t. So, it’s a bit tricky to answer those.

On the other hand, the questions that took the least amount of time are short and straightforward, and are obvious to judge yourself on. You either read a dictionary for fun to seem smart, or you don’t. I do find it interesting that it’s difficult for people to determine whether they’re ‘relaxed’ rather than ‘blue’. Likely because you conciously pay attention to when you’re sad, but not when you’re relaxed? You can also see that these questions are the ones that are asked later (e.g. AGR_9 is part of the 9th out of 10 rounds of questions, meaning that it’ll be really close to the ending of the quiz). Anticipation for results may have caused them to rush through the later questions.

Another interesting thing to note is the distrubtion of answers for each question. You may expected that each question has a distribution similar to standard deviation. However… that’s actually not the case! Many questions tend to have a distribution like so:

There are four major types of distribution seen in the data:

  1. Logarithmic (First Row)

The columns progressively increase or decrease. I think these graphs are the most “trustworthy”, as you’d only put 1 or 5 (the extremes) if you were incredibly confident in your answer.17 Not only that, but these questions aren’t really ‘shameful’ to admit, like “I get stressed out easily”, which allows people to be honest and pick extremes. Questions also tend to be less ‘situation’ and more ‘specific’, where you can very clearly visualize what you’d be doing in that situation, rather than responding “Oh… sometimes I am, sometimes I’m not!” (E.g. “I am quiet” likely wouldn’t follow this trend, but “I am quiet around strangers” probably does, because it helps narrow down the situation. You also don’t really change your behavior around different strangers, you usually are pretty constant with your behavior.)

Some other examples include:

If you’re wondering which side they skew on…. just trust your gut on it :)

  1. Skewed (Second Row)

i feel like these are best associated with questions where self-bias is most prevalent, as you want to make yourself as something you’re not (to make yourself feel better). These questions are also very situational; Sometimes I do this, sometimes I do. That’s why people tend to move towards the middle. This is the most common distribution type.

Examples include:

  1. “Normal” Distribution (Third Row, Left)

I’m lying to you - This graph isn’t normal distribution. I’m just calling it normal distribution because the answer 3 (neutral) is the most common answer. Quite frankly, it just means that people are either confused, or they don’t really have a large opinion on it, so they’re almost forced to choose 3. These questions seem to be the most ‘observable/objective’ of the bunch. Personally, I try to make a habit on avoiding clicking 3 no matter what (I’m not sure if others are the same), but it’s still interesting to see. I feel like these things are not things to be ‘proud’ of or dislike about yourself, nor would you mention it unless prompted, which is likely why it’s associated with the extrovertism questions.

Examples Include:

  1. Relatively Random (Third Row, Right)

So these are the ones that don’t follow a trend. Well, what happens is that 1 and 5 are similar in height and have about 150k responses, while 3, 4, 5 are also similar in height, with about 250k responses. It’s the kinds of questions that go “Yea… I’m definitely not SUPER ___, but it happens from time to time… I’m not sure how I would compare with other people though… so 2, 3, or 4 sound about right.”

Examples Include:


  1. The neuroticism table was actually calculated differently. For the other 4 traits, there were an equal number of ‘positive (e.g. extroversion) questions’ and ‘negative (e.g. introversion) questions.’ Neuroticism had 2 ‘positive’ questions and 8 negative questions. So, subtracted 3 from each result, where a score of 5 (Agree) became +2, and a score of 1 (negative) became -2. This helped balance the questions, while using a similar process. I tried using this method with the other tables and got the same values as my previous method, which is good to see!↩︎

  2. The average values will actually shift every time I update the website (on RStudio!), as I found a function to randomly grab 100 people then take the average, with the average changing every time I create a version of this webpage!↩︎

  3. I use agreeableness pretty synonymously with niceness. One can critique that isn’t the case, and that agreeability and niceness are more distinct than I make them out to be. That can absolutely be the case, and the story is not relevant. However, the point about society preferring agreeability still stands, and society will still influence how you act and how you perceive yourself.↩︎

  4. This was the only site that had theoretical values. I don’t know where they got their percentages from, and there was also no data on certain combinations. It should also be noted that this site does not use ‘X’ as a possible result. E.g. XLUEI (or any combination with an ‘X’) is not considered. So this source can not be completely trusted. It should also be noted that SimilarMinds separated the theoretical values by female and male. Since this data set does not have this distinction, I used the average of the male and female theoretical values.↩︎

  5. If you have more ideas, I’d love to hear them! Reach out :)↩︎

  6. There’s actually 3794 results for XXXXX↩︎

  7. ‘Country codes’ are kinda misleading. There are ~195 countries, and ISO has 249 different codes. This is because the ISO contains subdivisions of countries, e.g. Caymen Islands (UK) and Christmas Island (Australia)↩︎

  8. https://www.searchenginejournal.com/ranking-factors/physical-proximity-to-searcher↩︎

  9. I couldn’t find any location data on the website itself↩︎

  10. As a Canadian, it’s actually the third search result!↩︎

  11. e.g. the 996 schedule of working from 9am to 9pm, 6 days a week↩︎

  12. Youth in China are adopting the philosophy of ‘lying down’ and giving up, due to the bad job market after graduation. They often feel let down by their society, as they’ve always been taught that studying hard and getting into a good university will lead to a good life. Yet, when they graduate, they struggle to find jobs and keep themselves afloat.↩︎

  13. E.g. To get into university, students take the Gaokao, a two-day standardized test that determines their entire future.↩︎

  14. I actually forgot this type of graph existed, until I was randomly scrolling through the ggplot2 library of different graph types!↩︎

  15. as it’s quite uncommon and makes plotting the graph difficult↩︎

  16. 500 is enough to show the general trend, and utilizing more only makes graphs take longer to load.↩︎

  17. If you ask people to choose a number between 1 and 5, they tend to pick 2, 3, or 4, rather than 1 or 5.↩︎

  18. This one is almost inverted, where 2 and 4 are the most common responses, followed by 1 and 5, then 3.↩︎